Pattern based Outlier Detection in Mixed-Attribute Datasets

نویسندگان

  • Mandar Katdare
  • Warren Jin
چکیده

Outlier detection in mixed attribute datasets has proved to be a challenging task required in real world applications. Most existing algorithms for outlier detection do not consider the interactions between categorical and numerical attributes. The Pattern based Outlier Detection (POD) algorithm (Zhang & Jin, 2011), has had considerable success in the detecting outliers by analysing such interactions. In this report, the Pattern based Outlier Detection using Support Vector Machines (POD-SVM) algorithm for outlier detection is presented that tries to improve upon the results of POD. POD-SVM makes use of Support Vector Machines to detect patterns in datasets and thus, calculate an outlier score for records in a mixed attribute dataset. Experiments carried out illustrate that the POD-SVM algorithm provides results that are at least as good as the results obtained by the POD algorithm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Effective Pattern Based Outlier Detection Approach for Mixed Attribute Data

Detecting outliers in mixed attribute datasets is one of major challenges in real world applications. Existing outlier detection methods lack effectiveness for mixed attribute datasets mainly due to their inability of considering interactions among different types of, e.g., numerical and categorical attributes. To address this issue in mixed attribute datasets, we propose a novel Pattern based ...

متن کامل

Outlier Detection on Mixed-Type Data: An Energy-Based Approach

Outlier detection amounts to finding data points that differ significantly from the norm. Classic outlier detection methods are largely designed for single data type such as continuous or discrete. However, real world data is increasingly heterogeneous, where a data point can have both discrete and continuous attributes. Handling mixed-type data in a disciplined way remains a great challenge. I...

متن کامل

Outlier Detection in Wireless Sensor Networks Using Distributed Principal Component Analysis

Detecting anomalies is an important challenge for intrusion detection and fault diagnosis in wireless sensor networks (WSNs). To address the problem of outlier detection in wireless sensor networks, in this paper we present a PCA-based centralized approach and a DPCA-based distributed energy-efficient approach for detecting outliers in sensed data in a WSN. The outliers in sensed data can be ca...

متن کامل

Fast outlier detection using rough sets theory

In many Knowledge Discovery applications, finding outliers is more interesting than finding inliers in a dataset. The perception of outliers is rare cases in dataset in which is being described as abnormal data in the information table. Outliers detections are applied in many important applications like fraud detection systems to uncover the suspicious objects which may have important knowledge...

متن کامل

Screening Tools for Data Quality and Outlier Detection Applied to the Airbase Ambient Air Pollution Database

Systematic collection of long term mesoto large-scale datasets of ambient air quality provides an indispensible means for air pollution monitoring. However, the quality of these monitoring data depends on the chosen method of measurements and the QA/QC procedures applied. We present a consolidated screening tool for the automatic detection of outliers in large data volume air quality monitoring...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011